Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Faucet: streaming de novo assembly graph construction

Identifieur interne : 000D84 ( Main/Exploration ); précédent : 000D83; suivant : 000D85

Faucet: streaming de novo assembly graph construction

Auteurs : Roye Rozov [Israël] ; Gil Goldshlager [États-Unis] ; Eran Halperin [États-Unis] ; Ron Shamir [Israël]

Source :

RBID : PMC:5870852

Descripteurs français

English descriptors

Abstract

AbstractMotivation

We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased.

Results

Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata—coverage counts collected at junction k-mers and connections bridging between junction pairs—contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency—namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14–110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available.

Availability and implementation

Faucet is available at https://github.com/Shamir-Lab/Faucet

<xref ref-type="supplementary-material" rid="sup1">Supplementary information</xref>

Supplementary data are available at Bioinformatics online.


Url:
DOI: 10.1093/bioinformatics/btx471
PubMed: 29036597
PubMed Central: 5870852


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Faucet: streaming
<italic>de novo</italic>
assembly graph construction</title>
<author>
<name sortKey="Rozov, Roye" sort="Rozov, Roye" uniqKey="Rozov R" first="Roye" last="Rozov">Roye Rozov</name>
<affiliation wicri:level="1">
<nlm:aff id="btx471-aff1">Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Goldshlager, Gil" sort="Goldshlager, Gil" uniqKey="Goldshlager G" first="Gil" last="Goldshlager">Gil Goldshlager</name>
<affiliation wicri:level="2">
<nlm:aff id="btx471-aff2">Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Halperin, Eran" sort="Halperin, Eran" uniqKey="Halperin E" first="Eran" last="Halperin">Eran Halperin</name>
<affiliation wicri:level="2">
<nlm:aff id="btx471-aff3">Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA</wicri:regionArea>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
<affiliation wicri:level="1">
<nlm:aff id="btx471-aff1">Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">29036597</idno>
<idno type="pmc">5870852</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870852</idno>
<idno type="RBID">PMC:5870852</idno>
<idno type="doi">10.1093/bioinformatics/btx471</idno>
<date when="2017">2017</date>
<idno type="wicri:Area/Pmc/Corpus">000B20</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Corpus" wicri:corpus="PMC">000B20</idno>
<idno type="wicri:Area/Pmc/Curation">000B20</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Curation">000B20</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000854</idno>
<idno type="wicri:explorRef" wicri:stream="Pmc" wicri:step="Checkpoint">000854</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="RBID">pubmed:29036597</idno>
<idno type="wicri:Area/PubMed/Corpus">000B15</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">000B15</idno>
<idno type="wicri:Area/PubMed/Curation">000B15</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">000B15</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000914</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">000914</idno>
<idno type="wicri:Area/Ncbi/Merge">001C18</idno>
<idno type="wicri:Area/Ncbi/Curation">001C18</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001C18</idno>
<idno type="wicri:Area/Main/Merge">000D87</idno>
<idno type="wicri:Area/Main/Curation">000D84</idno>
<idno type="wicri:Area/Main/Exploration">000D84</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">Faucet: streaming
<italic>de novo</italic>
assembly graph construction</title>
<author>
<name sortKey="Rozov, Roye" sort="Rozov, Roye" uniqKey="Rozov R" first="Roye" last="Rozov">Roye Rozov</name>
<affiliation wicri:level="1">
<nlm:aff id="btx471-aff1">Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Goldshlager, Gil" sort="Goldshlager, Gil" uniqKey="Goldshlager G" first="Gil" last="Goldshlager">Gil Goldshlager</name>
<affiliation wicri:level="2">
<nlm:aff id="btx471-aff2">Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA</wicri:regionArea>
<placeName>
<region type="state">Massachusetts</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Halperin, Eran" sort="Halperin, Eran" uniqKey="Halperin E" first="Eran" last="Halperin">Eran Halperin</name>
<affiliation wicri:level="2">
<nlm:aff id="btx471-aff3">Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Departments of Computer Science, Anesthesiology and Perioperative Medicine, University of California Los Angeles, CA</wicri:regionArea>
<placeName>
<region type="state">Californie</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
<affiliation wicri:level="1">
<nlm:aff id="btx471-aff1">Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv, Israel</nlm:aff>
<country xml:lang="fr">Israël</country>
<wicri:regionArea>Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv</wicri:regionArea>
<wicri:noRegion>Tel Aviv</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bioinformatics</title>
<idno type="ISSN">1367-4803</idno>
<idno type="eISSN">1367-4811</idno>
<imprint>
<date when="2017">2017</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Genomics (methods)</term>
<term>Humans</term>
<term>Metagenome</term>
<term>Microbiota (genetics)</term>
<term>Sequence Analysis, DNA (methods)</term>
<term>Software</term>
</keywords>
<keywords scheme="KwdFr" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN ()</term>
<term>Génomique ()</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Microbiote (génétique)</term>
<term>Métagénome</term>
</keywords>
<keywords scheme="MESH" qualifier="genetics" xml:lang="en">
<term>Microbiota</term>
</keywords>
<keywords scheme="MESH" qualifier="génétique" xml:lang="fr">
<term>Microbiote</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Genomics</term>
<term>Sequence Analysis, DNA</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
<term>Humans</term>
<term>Metagenome</term>
<term>Software</term>
</keywords>
<keywords scheme="MESH" xml:lang="fr">
<term>Algorithmes</term>
<term>Analyse de séquence d'ADN</term>
<term>Génomique</term>
<term>Humains</term>
<term>Logiciel</term>
<term>Métagénome</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<title>Abstract</title>
<sec id="s1">
<title>Motivation</title>
<p>We present Faucet, a two-pass streaming algorithm for assembly graph construction. Faucet builds an assembly graph incrementally as each read is processed. Thus, reads need not be stored locally, as they can be processed while downloading data and then discarded. We demonstrate this functionality by performing streaming graph assembly of publicly available data, and observe that the ratio of disk use to raw data size decreases as coverage is increased.</p>
</sec>
<sec id="s2">
<title>Results</title>
<p>Faucet pairs the de Bruijn graph obtained from the reads with additional meta-data derived from them. We show these metadata—coverage counts collected at junction k-mers and connections bridging between junction pairs—contain most salient information needed for assembly, and demonstrate they enable cleaning of metagenome assembly graphs, greatly improving contiguity while maintaining accuracy. We compared Fauceted resource use and assembly quality to state of the art metagenome assemblers, as well as leading resource-efficient genome assemblers. Faucet used orders of magnitude less time and disk space than the specialized metagenome assemblers MetaSPAdes and Megahit, while also improving on their memory use; this broadly matched performance of other assemblers optimizing resource efficiency—namely, Minia and LightAssembler. However, on metagenomes tested, Faucet,o outputs had 14–110% higher mean NGA50 lengths compared with Minia, and 2- to 11-fold higher mean NGA50 lengths compared with LightAssembler, the only other streaming assembler available.</p>
</sec>
<sec id="s3">
<title>Availability and implementation</title>
<p>Faucet is available at
<ext-link ext-link-type="uri" xlink:href="https://github.com/Shamir-Lab/Faucet">https://github.com/Shamir-Lab/Faucet</ext-link>
</p>
</sec>
<sec id="s5">
<title>
<xref ref-type="supplementary-material" rid="sup1">Supplementary information</xref>
</title>
<p>
<xref ref-type="supplementary-material" rid="sup1">Supplementary data</xref>
are available at
<italic>Bioinformatics</italic>
online.</p>
</sec>
</div>
</front>
<back>
<div1 type="bibliography">
<listBibl>
<biblStruct>
<analytic>
<author>
<name sortKey="Bankevich, A" uniqKey="Bankevich A">A. Bankevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Bloom, B H" uniqKey="Bloom B">B.H. Bloom</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
<author>
<name sortKey="Rizk, G" uniqKey="Rizk G">G. Rizk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Chikhi, R" uniqKey="Chikhi R">R. Chikhi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="El Metwally, S" uniqKey="El Metwally S">S. El-Metwally</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Gurevich, A" uniqKey="Gurevich A">A. Gurevich</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Iqbal, Z" uniqKey="Iqbal Z">Z. Iqbal</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Li, D" uniqKey="Li D">D. Li</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Medvedev, P" uniqKey="Medvedev P">P. Medvedev</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Melsted, P" uniqKey="Melsted P">P. Melsted</name>
</author>
<author>
<name sortKey="Halldorsson, B V" uniqKey="Halldorsson B">B.V. Halldorsson</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Minkin, I" uniqKey="Minkin I">I. Minkin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Mohamadi, H" uniqKey="Mohamadi H">H. Mohamadi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nihalani, R" uniqKey="Nihalani R">R. Nihalani</name>
</author>
<author>
<name sortKey="Aluru, S" uniqKey="Aluru S">S. Aluru</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Novak, A M" uniqKey="Novak A">A.M. Novak</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Nurk, S" uniqKey="Nurk S">S. Nurk</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pell, J" uniqKey="Pell J">J. Pell</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pertea, M" uniqKey="Pertea M">M. Pertea</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Pevzner, P A" uniqKey="Pevzner P">P.A. Pevzner</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Prjibelski, A D" uniqKey="Prjibelski A">A.D. Prjibelski</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Roberts, A" uniqKey="Roberts A">A. Roberts</name>
</author>
<author>
<name sortKey="Pachter, L" uniqKey="Pachter L">L. Pachter</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Rozov, R" uniqKey="Rozov R">R. Rozov</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Shi, W" uniqKey="Shi W">W. Shi</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Simpson, J T" uniqKey="Simpson J">J.T. Simpson</name>
</author>
<author>
<name sortKey="Durbin, R" uniqKey="Durbin R">R. Durbin</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Song, L" uniqKey="Song L">L. Song</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Ye, C" uniqKey="Ye C">C. Ye</name>
</author>
</analytic>
</biblStruct>
<biblStruct>
<analytic>
<author>
<name sortKey="Zhang, Q" uniqKey="Zhang Q">Q. Zhang</name>
</author>
</analytic>
</biblStruct>
</listBibl>
</div1>
</back>
</TEI>
<affiliations>
<list>
<country>
<li>Israël</li>
<li>États-Unis</li>
</country>
<region>
<li>Californie</li>
<li>Massachusetts</li>
</region>
</list>
<tree>
<country name="Israël">
<noRegion>
<name sortKey="Rozov, Roye" sort="Rozov, Roye" uniqKey="Rozov R" first="Roye" last="Rozov">Roye Rozov</name>
</noRegion>
<name sortKey="Shamir, Ron" sort="Shamir, Ron" uniqKey="Shamir R" first="Ron" last="Shamir">Ron Shamir</name>
</country>
<country name="États-Unis">
<region name="Massachusetts">
<name sortKey="Goldshlager, Gil" sort="Goldshlager, Gil" uniqKey="Goldshlager G" first="Gil" last="Goldshlager">Gil Goldshlager</name>
</region>
<name sortKey="Halperin, Eran" sort="Halperin, Eran" uniqKey="Halperin E" first="Eran" last="Halperin">Eran Halperin</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000D84 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000D84 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:5870852
   |texte=   Faucet: streaming de novo assembly graph construction
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:29036597" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021